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(54) Method and apparatus for video size conversion 



(57) Methods and apparatus for performing 2:1 
downscaling on video data are provided. At least one 
input matrix of NxN (e.g., N=16) Discrete Cosine Trans- 
form (DCT) coefficients is formed from the video data 
by combining four N/2xN/2 field-mode DCT blocks. Ver- 



tical downsampling and de-interlacing are performed to 
the input matrix to obtain two N/2xN/2 frame-mode DCT 
blocks. An NxN/2 input matrix is formed from the two 
frame-mode DCT blocks. Horizontal downsampling is 
performed to the NxN/2 matrix to obtain one N/2xN/2 
frame-mode DCT block. 
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Description 

BACKGROUND OF THE INVENTION 

5 [0001] The present Invention relates to compression of multimedia data and, in particular, to a video transcoder that 
allows a generic MPEG-4 decoderto decode MPEG-2 bitstreams. Temporal and spatial size conversion (downscaling) 
are also provided. 

[0002] The following acronyms and terms are used: 

10 CBP - Coded Block Pattern 

DCT - Discrete Cosine Transform 

DTV - Digital Television 

DVD - Digital Video Disc 

HDTV - High Definition Television 
15 FLC - Fixed Length Coding 

IP - Internet Protocol 

MB - Macroblock 

ME - Motion Estimation 

ML - Main Level 
20 MP -Main Profile 

MPS - MPEG-2 Program Stream 

MTS - MPEG-2 Transport Stream 

MV - Motion Vector 

QP - quantization parameter 
25 PMV - Prediction Motion Vector 

RTP - Real-Time Transport Protocol (RFC 1889) 

SDTV - Standard Definition Television 

SIF - Standard Intermediate Format 

SVCD - Super Video Compact Disc 
30 VLC - Variable Length Coding 

VLD - Variable Length Decoding 

VOP- Video Object Plane 

[0003] MPEG-4, the multimedia coding standard, provides a rich functionality to support various applications, includ- 
es ing Internet applications such as streaming media, advertising, interactive gaming, virtual traveling, etc. Streaming 
video over the Internet (multicast), which is expected to be among the most popular application for the Internet, is also 
well-suited for use with the MPEG-4 visual standard (ISO/IEC 14496-2 Final Draft of International Standard (MPEG- 
4), "Information Technology - Generic coding of audio-visual objects, Part 2: visual," Dec. 1998). 
[0004] MPEG-4 visual handles both synthetic and natural video, and accommodates several visual object types, 
40 such as video, face, and mesh objects. MPEG-4 visual also allows coding of an arbitrarily shaped object so that multiple 
objects can be shown or manipulated in a scene as desired by a user. Moreover, MPEG-4 visual is very flexible in 
terms of coding and display configurations by including enhanced features such as multiple auxiliary (alpha) planes, 
variable frame rate, and geometrical transformations (sprites). 

[0005] However, the majority of the video material (e.g., movies, sporting vents, concerts, and the like) which is 
45 expected to be the target of streaming video is already compressed by the MPEG-2 system and stored on storage 
media such as DVDs, computer memories (e.g., server hard disks), and the like. The MPEG-2 System specification 
(ISO/IEC 13818-2 International Standard (MPEG-2), "Information Technology - Generic coding of Moving Pictures and 
Associated Audio: Part 2 - Video," 1995) defines two system stream formats: the MPEG-2 Transport Stream (MTS) 
and the MPEG-2 Program Stream (MPS). The MTS is tailored for communicating or storing one or more programs of 
so MPEG-2 compressed data and also other data in relatively error-prone environments. One typical application of MTS 
is DTV. The MPS is tailored for relatively error-free environments. The popular applications include DVD and SVCD. 
[0006] Attempts to address this issue have been unsatisfactory to date. For example, the MPEG-4 studio profile (O. 
Sunohara and Y. Yagasaki, The draft of MPEG-4 Studio Profile Amendment Working Draft 2.0," ISO/IEC 
JTC1/SC29/WG11 MPEG99/5135, Oct. 1999) has proposed a MPEG-2 to MPEG-4 transcoder, but that process is not 
55 applicable to the other MPEG-4 version 1 profiles, which include the Natural Visual profiles (Simple, Simple Scaleable, 
Core, Main, N-Bit), Synthetic Visual profiles (Scaleable Texture, Simple Face Animation), and Synthetic/Natural Hybrid 
Visual (Hybrid, Basic Animated Texture). The studio profile is not applicable to the Main Profile of MPEG-4 version 1 
since it modifies the syntax, and the decoder process is incompatible with the rest of the MPEG-4 version 1 profiles. 
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[0007] The MPEG standards designate several sets of constrained parameters using a two-dimensional ranking 
order. One of the dimensions, called the "profile" series, specifies the coding features supported. The other dimension, 
called "level", specifies the picture resolutions, bit rates, and so forth, that can be accommodated. 
[0008] For MPEG-2, the Main Profile at Main Level, or MP@ML, supports a 4:2:0 color subsampling ratio, and I, P 
5 and B pictures. The Simple Profile is similar to the Main Profile but has no B-pictures. The Main Level is defined for 
ITU-R 601 video, while the Simple Level is defined for SIF video. 

[0009] Similarly, for MPEG-4, the Simple Profile contains SIF progressive video (and has no B-VOPs or interlaced 
video). The Main Profile allows B-VOPs and interlaced video. 

[0010] Accordingly, it would be desirable to achieve interoperability among different types of end-systems by the use 
10 of MPEG-2 video to MPEG-4 video transcoding and/or MPEG-4-video to MPEG-2-video transcoding. The different 
types of end-systems that should be accommodated include: 

[0011] Transmitting Interworking Unit (TIU): Receives MPEG-2 video from a native MTS (or MPS) system and trans- 
codes to MPEG-4 video and distributes over packet networks using a native RTP-based system layer (such as an IP- 
based internetwork). Examples include a real-time encoder, a MTS satellite link to Internet, and a video server with 
15 M PS-encoded source material. 

[0012] Receiving Interworking Unit (RIU): Receives MPEG-4 video in real time from an RTP-based network and then 
transcodes to MPEG-2 video (if possible) and forwards to a native MTS (or MPS) environment. Examples include an 
Internet-based video server to MTS-based cable distribution plant. 

[0013] Transmitting Internet End-System (TIES): Transmits MPEG-2 or MPEG-4 video generated or stored within 
20 the Internet end-system itself, or received from internet-based computer networks. Examples include a video server. 
[0014] Receiving Internet End-System (RIES): Receives MPEG-2 or MPEG-4 video over an RTP-based internet for 
consumption at the Internet end-system or forwarding to a traditional computer network. Examples include a desktop 
PC or workstation viewing a training video. 

[0015] It would be desirable to determine similarities and differences between MPEG-2 and MPEG-4 systems, and 
25 provide transcoder architectures which yield a low complexity and small error. 

[0016] The transcoder architectures should be provided for systems where B-frames are enabled (e.g., main profile), 
as well as a simplified architecture for when B-frames are not used (simple profile). 
[0017] Format (MPEG-2 to MPEG-4) and/or size transcoding should be provided. 

[0018] It would also be desirable to provide an efficient mapping from the MPEG-2 to MPEG-4 syntax, including a 
30 mapping of headers. 

[0019] The system should include size transcoding, including spatial and temporal transcoding. 

[0020] The system should allow size conversion at the input bitstream or output bitstream of a transcoder. 

[0021] The size transcoder should convert a bitstream of ITU-R 601 interlaced video coded with MPEG-2 MP@ML 

into a simple profile MPEG-4 bitstream which contains SIF progressive video suitable, e.g., for a streaming video 
35 application. 

[0022] The system should provide an output bitstream that can fit in the practical bandwidth for a streaming video 
application (e.g., less than 1 Mbps). 

[0023] The present invention provides a system having the above and other advantages. 

40 SUMMARY OF THE INVENTION 

[0024] The invention relates to format transcoding (MPEG-2 to MPEG-4) and size (spatial and temporal) transcoding. 
[0025] A proposed transcoder includes size conversion, although these parameters can be transcoded either at the 
input bitstream or the output bitstream. However, it is more efficient to include all kinds of transcoding into the product 
45 version of a transcoder to reduce the complexity since the transcoders share processing elements with each other 
(such as a bitstream reader). 

[0026] The invention addresses the most important requirements for a transcoder, e.g., the complexity of the system 
and the loss generated by the process. 

[0027] In one embodiment, a proposed front-to-back transcoder architecture reduces complexity because there is 
so no need to perform motion compensation. 

[0028] In a particular embodiment, the transcoder can use variable 5-bit QP representation, and eliminates AC/DC 
prediction and the nonlinear DC scaler. 

[0029] The invention is alternatively useful for rate control and resizing, 

[0030] A particular method for transcoding a pre-compressed input bitstream that is provided in a first video coding 
55 format includes the steps of: recovering header information of the input bitstream; providing corresponding header 
information in a second, different video coding format; partially decompressing the input bitstream to provide partially 
decompressed data; and recompressing the partially decompressed data in accordance with the header information 
In the second format to provide the output bitstream. 
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[0031 ] A method for performing 2:1 downscaling on video data includes the steps of: forming at least one input matrix 
of NxN (e.g., N=1 6) Discrete Cosine Transform (DCT) coefficients from the video data by combining four N/2xN/2 field- 
mode DCT blocks; performing vertical downsampling and de-interlacing to the input matrix to obtain two N/2xN/2 frame- 
mode DCT blocks; forming an NxN/2 input matrix from the two frame-mode DCT blocks; and performing horizontal 
5 downsampling to the NxN/2 matrix to obtain one N/2xN/2 frame-mode DCT block. 

[0032] Preferably, the vertical and horizontal downsampling use respective sparse downsampling matrixes. In par- 
ticular, a vertical downsampling matrix of 0.5[I Q y may be used, where Is is an 8x8 identity matrix. This is essentially 
vertical pixel averaging. A horizontal downsampling matrix composed of odd "O" and even "E" matrices may be used. 
[0033] Corresponding apparatuses are also presented. 

10 

BRIEF DESCRIPTION OF THE DRAWINGS 

[0034] The present invention will hereinafter be described in conjunction with the appended drawing figures, wherein 
like reference numerals denote like elements, and: 

15 

FIG. 1 illustrates an MPEG-2 video decoder. 

FIG. 2 illustrates an MPEG-4 video decoder without any scalability feature. 

FIG. 3 illustrates a low complexity front-to-back transcoder (with B frames disabled) in accordance with the inven- 
tion. 

20 FIG. 4 illustrates a transcoder architecture that minimizes drift error (with B frames enabled) in accordance with 

the invention. 

FIG. 5 illustrates a size transcoder in accordance with the invention. 

FIG. 6 illustrates downsampling of four field mode DCT blocks to one frame mode DCT block in accordance with 
the present invention. 

25 

DETAILED DESCRIPTION 

[0035] The ensuing detailed description provides exemplary embodiments only, and is not intended to limit the scope, 
applicability, or configuration of the invention. Rather, the ensuing detailed description of the exemplary embodiments 
30 will provide those skilled in the art with an enabling description for implementing an embodiment of the invention. It 
should be understood that various changes may be made in the function and arrangement of elements without departing 
from the spirit and scope of the invention as set forth in the appended claims. 

[0036] The invention relates to format transcoding (MPEG-2 to MPEG-4) and size (spatial and temporal) transcoding. 
[0037] The invention provides bit rate transcoding to convert a pre-compressed bitstream into another compressed 

35 bitstream at a different bit rate. Bit rate transcoding is important, e.g., for streaming video applications because the 
network bandwidth is not constant and, sometimes, a video server needs to reduce the bit rate to cope with the network 
traffic demand. A cascaded-based transcoder which re-uses MVs from the input bitstream and, hence, eliminates 
motion estimation (ME), is among the most efficient of the bit rate transcoders. The cascaded-based transcoder de- 
codes the input bitstream to obtain the MV and form the reference frame. It then encodes this information with a rate 

to control mechanism to generate an output bitstream at the desired bit rate. 

[0038] Spatial resolution transcoding becomes a big issue with the co-existence of HDTV and SDTV in the near 
future. It is also very beneficial for the streaming video application since it is likely that the Internet bandwidth is not 
going to be large enough for broadcast quality video. Hence, downsampling of the broadcast quality bitstream into a 
bitstream with a manageable resolution is appealing. Spatial resolution transcoding usually performs in the compressed 

45 (DCT) domain since it drastically reduces the complexity of the system. The process of downsampling in the com- 
pressed domain involves the processing of two parameters, namely DCT coefficients and MVs. A downsampling filter 
and its fast algorithm is suggested to perform DCT coefficient downsampling. MV resampling is used to find the MV of 
the downsampled video. In the real product, to avoid drift, the residual of the motion compensation should be re- 
transformed instead of approximating the DCT coefficients from the input bitstream. 

50 

2. High level comparison 

[0039] Structure-wise, MPEG-2 and MPEG-4 employ a similar video compression algorithm. Fundamentally, both 
standards adopt motion prediction to exploit temporal correlation and quantization in the DCT domain to use spatial 
55 correlation within a frame. This section describes the structure of the MPEG-2 and MPEG-4 decoders at a high level, 
and then notes differences between the two standards. 
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2.1 MPEG-2 

[0040] FIG. 1 shows the simplified video decoding process of MPEG-2. In the decoder 100, coded video data is 
provided to a variable length decoding function 1 1 0 to provide the one-dimensional data QFS[n], where n is a coefficient 

5 index in the range of 0-63. At the inverse scan function 120, QFS[n] is converted into a two-dimensional array of 
coefficients denoted by QF[v][u], where the array indexes u and v both lie in the range 0 to 7. An inverse quantisation 
function 130 applies the appropriate inverse quantisation arithmetic to give the final reconstructed, frequency-domain 
DCT coefficients, F[v][u]. An inverse DCT function 140 produces the pixel (spatial) domain values f[y][x]. A motion 
compensation function 1 50 is responsive to a frame store memory 1 60 and the values f[y][x] for producing the decoded 

10 pixels (pels) d[y][x], where y and x are Cartesian coordinates in the pixel domain. 

[0041] MPEG-2 operates on amacroblock level for motion compensation, a block level for the DCT transformation, 
and the coefficient level for run-length and lossless coding. Moreover, MPEG-2 allows three types of pictures, namely 
I-, P- and B- pictures. Allowed motion prediction modes (forward, backward, bi-directional) are specified for the P- and 
B- pictures. MPEG-2 uses interlaced coding tools to handle interlaced sources more efficiently. 

15 

2.2 MPEG-4 

[0042] FIG. 2 shows the MPEG-4 video decoding process without any scalability features. 
[0043] At the decoder 200, data from a channel is output from a demux 21 0. A coded bit stream of shape data is 
20 provided to a switch 215, along with the MPEG-4 term video_object_layer_shape (which indicates, e.g., whether or 
not the current image is rectangular, binary only, or grayscale). If video_object_layer_shape is equal to "00" then no 
binary shape decoding is required. Otherwise, binary shape decoding is carried out. 

[0044] If binary shape decoding is performed, a shape decoding function 220 receives the previous reconstructed 
VOP 230 (which may be stored in a memory), and provides a shape-decoded output to a motion compensation function 

25 240. The motion compensation function 240 receives an output from a motion decoding function 235, which, in turn, 
receives a motion coded bit stream from the demux 210. The motion compensation function 240 also receives the 
previous reconstructed VOP 230 to provide an output to a VOP reconstruction function 245. 
[0045] The VOP reconstruction function 245 also receives data from a texture decoding function 250 which, in turn, 
receives a texture coded bit stream from the demux 210, in addition to an output from the shape decoding function 

30 220. The texture decoding function 250 includes a variable length decoding function 255, an inverse scan function 
260, an inverse DC and AC prediction function 270, an inverse quantization function 280 and an Inverse DCT (IDCT) 
function 290. 

[0046] Compared to MPEG-2, several new tools are adopted in MPEG-4 to add features and interactivity, e.g., sprite 
coding, shape coding, still texture coding, scalability, and error resilience. Moreover, motion compensation and texture 
35 coding tools in MPEG-4, which are similar to MPEG-2 video coding, are modified to improve the coding efficiency, e. 
g., coding tools such as direct mode motion compensation, unrestricted motion compensation, and advanced predic- 
tion. 

[0047] In particular, direct mode motion compensation is used for B-VOPs. Specifically, it uses direct bi-directional 
motion compensation derived by employing I- or P-VOP macroblock MVs and scaling them to derive forward and 
40 backward MVs for macroblocks in B-VOP. Only one delta MV is allowed per macroblock. The actual MV is calculated 
from the delta vector and the scaled MV from its co-located macroblock. 

[0048] Unrestricted motion compensation allows one or four MVs per macroblock. The four MV mode is only possible 
in B-VOPs with the use of direct mode. Note that the MV for a chrominance macroblock is the average of four MVs 
from its associated luminance macroblock. Furthermore, unrestricted motion compensation allows an MV to point out 
45 of the reference frame (the out-of-bound texture is padded from the edge pixel). 

[0049] Advanced prediction defines the prediction method for MV and DCT coefficients. A MV predictor is set ac- 
cording to the median value of its three neighbors' MVs. Prediction of the intra DCT coefficient follows the intra AC/DC 
prediction procedure (Graham's rule). 

50 3. Transcoder architecture 

[0050] FIG. 3 illustrates a low complexity front-to-back transcoder in accordance with the invention, with B frames 
disabled. 

[0051] Similarities between the structures of MPEG-2 and MPEG-4 allowa low complexity (front-to-back) transcoder. 
55 Instead of completely decoding an MPEG-2 bitstream to the spatial (pixel) domain level, the front-to-back transcoder 
300 uses DCT coefficients and MVs to generate an MPEG-4 bitstream without actually performing a motion estimation 
process. A trade-off is that this architecture may cause a drift in the reconstructed frame, and does not allow bit rate 
control. However, the drift problem is small since most of the difference between the MPEG-2 and MPEG-4 decoders 
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lies in the lossless coding part. 

[0052] The transcoder 300 comprises a cascade of a MPEG-2 bitstream reader (decoder) (31 0-330) and a MPEG- 
4 header and texture coder (encoder) (340-370), along with a header decoding function 304, a look-up table 308, and 
a communication path 312. The transcoder 300 reads an input MPEG-2 bitstream, performs a variable length decoding 
5 (VLD) at a function 310 on DCT coefficients and MV residual, and then follows MPEG-2 logic to find DCT coefficients 
and/or MVs of every block in the frame. 

[0053] The header decoding function 304 decodes the MEPG-2 headers and provides them to a look-up table (or 
analogous function) 308, which uses the tables detailed below to obtain corresponding MPEG-4 headers. 
[0054] With the information of the headers, DCT coefficients and/or MV, the transcoder 300 encodes this information 
10 into the MPEG-4 format. Note that the reference frame is not needed in this architecture. 

[0055] The transcoder 300 reads the MPEG-4 header from the input bitstream and writes the corresponding MPEG- 
4 header in its place in an output bitstream. 

[0056] After processing at the VLD 31 0, the data is provided to an inverse scan function 320, and an inverse quan- 
tisation function 330. Next, using the MPEG-4 header information provided via the path 312, the decoded, DCT coef- 

15 ficient data is processed at a M PEG-4 header and texture coder that includes a quantisation function 340, and an AC/ 
DC prediction function 350 for differentially encoding the quantised DCT coefficients. In particular, the AC/DC prediction 
process generates a residual of DC and AC DCT coefficients in an intra MB by subtracting the DC coefficient and either 
the first row or first column of the AC coefficients. The predictor is adaptively selected. Note that the AC/DC prediction 
function 350 may not need the MPEG-4 header information. 

20 [0057] Subsequently, a scan/run-length coding function 360 and a variable length encoding function 370 provide the 
MPEG-4 bitstream. 

[0058] FIG. 4 illustrates a transcoder architecture that minimizes drift error in accordance with the invention, with B 
frames enabled. 

[0059] Like-numbered elements correspond to one another in the figures. 
25 [0060] To counter the problems of drift in the reconstructed frame, and the lack of bit rate control, a more complex 

architecture such as the transcoder 400, which is an extension of the transcoder 300 of FIG. 3, can be used. This 

architecture actually computes the DCT coefficient of the texture/residual data, hence motion compensation is required. 

Since the encoder of this transcoder includes a decoding process, the drift error can be minimized. 

[0061] Moreover, the transcoder 400 can be used to transcode bitstreams with B-frames since MPEG-4 does not 
30 allow intra mode for B-frames. The transcoder 400 treats a block in intra mode in a B-frame (in MPEG-2) as a block 

with a zero MV in inter mode (in MPEG-4). It can be either a zero residual MV (PMV) or zero MV (which may yield a 

non-zero MV code) since the MV is predictive coded against the PMV. 

[0062] In particular, the transcoder 400 includes a variable length decoding function 405 that provides MV residue 
data to a MV decoder 425, and that provides DCT coefficient data to the inverse scan function 320. The DCT data is 
35 processed by the inverse quantisation function 330 and an inverse DCT function 420 to obtain pixel domain data. Intra- 
coded pixel data is provided via a path 422 to a buffer, while inter-coded pixel data is provided to an adder 435 via a 
path 424. 

[0063] The pixel (difference) data on path 424 is added to reference pixel data from a motion compensation function 
430 (responsive to the MV decoder 425) to provide inter-coded data to the buffer 450 via a path 448. 

40 [0064] For re-encoding, e.g., in the MPEG-4 format, the buffer 450 either outputs the intra pixel data directly to a 
DCT function 455, or outputs the inter pixel data to a subtractor 445, where a difference relative to an output from a 
motion compensation function 440 (responsive to the MV decoder 425) is provided to the DCT function 455. 
[0065] The DCT coefficients are provided from the DCT function 455 to the quantisation function 340, and the quan- 
tised DCT data is then provided to the AC/DC (DCT coefficient) prediction function 350, where AC and DC residuals 

45 of the current MB are generated. These residuals of DCT coefficients are entropy coded. The output data is provided 
to the scan/run-length coding function 360, and the output thereof is provided to the variable length encoding function 
370 to obtain the MPEG-4 compliant bitstream. 

[0066] The quantised DCT coefficients are also output from the quantisation function 340 to an inverse quantisation 
function 495, the output of which is provided to an inverse DCT function 490, the output of which is summed at an 
50 adder 485 with the output of the motion compensation function 440. The output of the adder 485 is provided to a buffer 
480, and subsequently to the motion compensation function 440. 

[0067] The header decoding function 304 and look-up table 308 and path 312 operate as discussed in connection 
with FIG. 3 to control the re-encoding to the MPEG-4 format at functions 340-370. 

55 4. Implementation of the Format Transcoder 

[0068] This section explains the implementation of the format transcoding, e.g., as implemented in FIGs 3 and 4, 
discussed above, and FIG. 5, to be discussed later. Minor implementation details (e.g., systems-related details such 
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as the use of time stamps and the like) that are not specifically discussed should be apparent to those skilled in the art. 
[0069] In a particular implementation, the transcoders of the present invention can be used to convert a main-profile, 
main-level (MP@ML) MPEG-2 bitstream into a main-profile MPEG-4 bitstream. it is assumed that the MPEG-2 bit- 
stream is coded in frame picture structure with B-picture coding (no dual prime prediction). Generally, the same coding 
5 mode which is used in MPEG-2 coding should be maintained. This mode is likely to be optimum in MPEG-4 and hence 
avoids the complexity of the mode decision process. The transparency pattern in MPEG-4 is always 1 (one rectangular 
object with the same size of VOP in one VOP). That is, MPEG-4 allows an arbitrarily shaped object which is defined 
by a nonzero transparency pattern. This feature does not exist in MPEG-2 so we can safely assume that all transparency 
patterns of the transcoding object is one. 

10 

4.1 MPEG-2 bitstream reader 

[0070] A transcoder in accordance with the invention obtains the bitstream header, DCT coefficients and MVs from 
the MPEG-2 bitstream. This information is mixed together in the bitstream. Both MPEG-2 and MPEG-4 bitstreams 
15 adopt a hierarchical structure consisting of several layers. Each layer starts with the header following by a multiple of 
its sublayer. In this implementation, as shown in Table 1 , the MPEG-2 layer has a direct translation into the MPEG-4 
layer, except the slice layer in MPEG-2, which is not used in MPEG-4. DC coefficients and predicted MVs in MPEG-4 
are reset at the blocks that start the slice. 

[0071] However, some MPEG-4 headers are different from MPEG-2 headers, and vice versa. Fortunately, the re- 
20 strictions in MPEG-2 and MPEG-2 header information are sufficient to specify a MPEG-4 header. Tables 2 through 6 
list MPEG-4 headers and their relation to a MPEG-2 header or restriction at each layer. 



Table 1 . 



Relationship between MPEG-2 and MPEG-4 layers 


MPEG-2 


MPEG-4 


Video Sequence 


Video Object Sequence (VOS) / 
Video Object (VO) 


Sequence Scalable Extension 


Video Object Layer (VOL) 


Group of Picture (GOP) 


Group of Video Object Plane (GOV) 


Picture 


Video Object Plane (VOP) 


Macro block 


Mac rob lock 



35 



Table 2. 



MPEG-4 header and its derivation (VOS and VO) 


Header 


Code 


Comment 


Visual_object_sequence_start _code 


00001 B0 


Initiate a visual session 


Profile_and_level_indication 


00110100 


Main Profile/Level 4 


Visual_object_sequence_end _code 


00001 B1 


Terminate a visual session 


VisuaLobject_start_code 


00001 B5 


Initiate a visual object 


ls_visuai_object_identifier 


0 


No version identification of priority needs to be 
specified 


Visual_object_type 


0001 


Video ID 


Video_object_start_code 


0000010 X-0000011 X 


Mark a new video object 


Video_signal_type 


Derived from MPEG-2 


Corresponds to MPEG-2 
sequence_disp!ay_extensio n_id 


Videojormat 


Same as MPEG-2 


Corresponds to MPEG-2 
sequence_display_extensio njd 
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Table 2. (continued) 



MPEG-4 header and its derivation (VOS and VO) 


Header 


Code 


Comment 


Video_range 


Derived from MPEG-2 


Corresponds to MPEG-2 

con i inn r*Q Wionluw o vto n c in n iH 


Colour_description 


Same as MPEG-2 


Corresponds to MPEG-2 
sequence_display_extensio njd 


Colour_primaries 


Same as MPEG-2 


Corresponds to MPEG-2 colou ^description 


Transfer_characteristics 


Same as MPEG-2 


Corresponds to MPEG-2 colour_description 


Matrix_coefficients 


Same as MPEG-2 


Corresponds to MPEG-2 co!our_description 


Table 3. 
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MPEG-4 header and its derivation (VOL) 


Header 


Code 


Comment 


Video_object_tayer_start_c ode 


000001 2X 


Mark a new video object layer 


Random_accessibie_vol 


0 


Allow non-intra coded VOP 


Video_object_type_identifi cation 


00000100 


Main object type 


ls_object_type_identifier 


0 


No version identification of priority needs to be 

qnprif ipH 


A^npct ratio info 


Same as MPEG-2 


CorresDonds to MPEG-2 asoect ratio information 


Par width 


Same as MPEG-2 


CorresDonds to MPEG-2 vertical size 


Par heiaht 

1 ul 1 lvr\J 1 1 K 


Same as MPEG-2 


CorresDonds to MPEG-2 horizontal size 


Vol control oarameters 


Same as MPEG-2 


Corresponds to MPEG-2 
extension_start_code_ident ifier (sequence 
extension) 


Chromajormat 


Same as MPEG-2 


Corresponds to MPEG-2 chromajormat 


Low_delay 


Same as MPEG-2 


Corresponds to MPEG-2 low_delay 


Vbv_parameters 


Recomputed 


Follow MPEG-4 VBV spec. 


Video_objectJayer _shape 


00 


Rectangular 


Vop_timejncrement_resol ution 


Recomputed 


See Table 7 


Fixed_vop_rate 


1 


Indicate that all VOPs are coded at a fixed rate 


Fixed_vop_time_increment 


Recomputed 


See Table 7 


Video_objectjayer _width 


Same as MPEG-2 


Correpond to display_vertical_size 


Video_object Jayer ^height 


Same as MPEG-2 


Correspond to display_horizontal_size 


Interlaced 


Same as MPEG-2 


Correspond to progressive_sequence 


Obmc_disable 


1 


Disable OBMC 


s Sprite_enable 


0 


Indicate absence of sprite 


Not_8_bit 


Derived from MPEG-2 


Corresponds to MPEG-2 intra_dc_precision 


Quant_type 


1 


MPEG quantization 


Complexity_estimation_dis able 


1 


Disable complexity estimation header 


Resync_marker_disable 


1 


Indicate absence of resync_marker 



20 



25 



30 



35 



40 



45 



50 



55 



8 



EP 1 401 209 A2 



Table 3. (continued) 



MPEG-4 header and its derivation (VOL) 


Header 


Code 


Comment 


Data_partitioned 


0 


Disable data partitioning 


Reversible_vlc 


0 


Disable reversible vie 


Scalability 


0 


Indicate that the current layer is used as base-layer 



Table 4. 


MPEG-4 header and its derivation (VOP) 


Header 


Code 


Comment 


Vop_start_code 


000001 B6 


Mark a start of a video object plane 


Vop_coding_type 


Same as MPEG-2 


Corresponds to MPEG-2 picture_codingJype 


Modulo_time_base 


Regenerat ed 


Follow MPEG-4 spec. 


VopJime_increment 


Regenerat ed 


Follow MPEG-4 spec. 


Vop_coded 


1 


Indicate that subsequent data exists for the VOP 


Vop_rounding_type 


0 


Set value of rounding_control to '0' 


Change_conversion_ratio_d is able 


1 


Assume that conv_ratio is '1' for all macroblocks 


Vop_constant_alpha 


0 


Not include vop_constant_atpha_value in the 
bitstream 


lntra_dc_vlc_thr 


0 


Use intra DC vie for entire VOP 


Topjieldjirst 


Same as MPEG-2 


Corresponds to MPEG-2 top_field_first 


Atternate_vertica_scan Jla Same g 


as MPEG-2 


Corresponds to MPEG-2 to alternate_scan 


Vop_quant 


Derived from MPEG-2 


Corresponds to MPEG-2 quantiser_scale_code 


Vopjcodejorward 


Same as MPEG-2 


See section 4.3 


Vop Jco de_backwa rd 


Same as MPEG-2 


See section 4.3 


Table 5. 



10 



15 



20 



25 



30 



35 



MPEG-4 header and its derivation (macroblock and MV) 


Header 


Code 


Comment 


Not_coded 


Derived from MPEG-2 


Corresponds to MPEG-2 macroblock_addressjncrement 


Mcbpc 


Derived from MPEG-2 


Corresponds to MPEG-2 macroblockjype 


Ac_pred_flag 


0 


Disable intra AC prediction 


Cbpy 


Derived from MPEG-2 


See section 4.2 


Dquant 


Derived from MPEG-2 


See section 4.2 


Modb 


Derived from MPEG-2 


Corresponds to macroblockjype 


Mbjype 


Derived from MPEG-2 


Corresponds to macroblockjype 


Cbpb 


Derived from MPEG-2 


See section 4.2 


Dbquant 


Derived from MPEG-2 


See section 4.2 


Horizontal_mv_dat a 


Derived from MPEG-2 


Corresponds to MPEG-2 motion_code[r][s][0] 


Vertical__mv_data 


Derived from MPEG-2 


Corresponds to MPEG-2 motion_code[r][s][1 ] 



40 



45 



50 



55 
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Table 5. (continued) 



MPEG-4 header and its derivation (macroblock and MV) 


Header 


Code 


Comment 


Horizontal_mv_resi dual 


Derived from MPEG-2 


Corresponds to MPEG-2 motion_residual[r][s][0] 


Vertical_mv_residu al 


Derived from MPEG-2 


Corresponds to MPEG-2 motion _residual[r][s][1 ] 



10 Table 6. 



MPEG-4 header and its derivation (block and interlaced information) 


Header 


Code 


Comment 


Dct_dc_size_luminance 


Same as MPEG-2 


Corresponds to MPEG-2 dct_dc_size_luminance 


Dct_dc_differential 


Same as MPEG-2 


Correspond to dct_dc_differential 


DcLdc_size_chrominance 


Same as MPEG-2 


Corresponds to MPEG-2 
dct_dc_size_chrominance 


DCT_coefficient 


Derived from MPEG-2 


See section 4.2 


DCTJype 


Same as MPEG-2 


Corresponds to MPEG-2 DCTJype 


Field_prediction 


Same as MPEG-2 


Corresponds to MPEG-2 frame_motion_type 


Forward_top_field_reference 


Same as MPEG-2 


Corresponds to MPEG-2 
motion_vertical_field_selec t[0][0] 


Forward_bottom_field_reference 


Same as MPEG-2 


Corresponds to MPEG-2 
motion_vertica!J ield_selec t[1 ][0] 


Backward_top_field_reference 


Same as MPEG-2 


Corresponds to MPEG-2 
motion_vertical_field_selec t[0][1] 


Backward_bottom_field_reference 


Same as MPEG-2 


Corresponds to MPEG-2 
motion_vertical_f ield_selec t[1 ][1 ] 



Table 7. 



Mapping of frame_rate_code in MPEG-2 to 
vop_timejncrement_resolution and fixed_vop_time_increment in MPEG-4. 


Frame_rate_code 


Vop_time_increment_resolution 


Fixed_vop_time_increment 


0001 


24,000 


1001 


0010 


24 


1 


0011 


25 


1 


0100 


30,000 


1001 


0101 


30 


1 


0110 


50 


1 


0111 


60,000 


1001 


1000 


60 


1 



[0072] MV data is stored in the macroblock layer. Up to four MVs are possible for each macroblock. Moreover, a MV 
can be of either field or frame type and have either full pixel or half pixel resolution. The MPEG-2 MV decoding process 
55 is employed to determine motion_code (VLC) and motion j-esidual (FLC) and, hence, delta. Combined with predictive 
MV, delta gives the field/frame MV. The MV for skipped macroblocks is set to zero. 

[0073] DCT data is stored in the block layer It is first decoded from the bitstream (VLC), inverse scanned using either 
zigzag or alternate scanning pattern, and then inverse quantized. The intra DC coefficient is determined from 
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dct_dc_differential and the predictor (the predictor is reset according to the MPEG -2 spec). DCT coefficients in a skipped 
macroblock are set to zero. 

4.2 Texture coding 

5 

[0074] A transcoder in accordance with the invention reuses DCT coefficients (for inter frame). The following guide- 
lines should be used: 

1 . q_scale_type = 1 (linear scale) is used in MPEG-2 quantization. 
10 2. The MPEG quantization method should only be used (not H.263) in MPEG-4 quantization to reduce a mismatch 
between MPEG-2 and MPEG-4 reconstructed frame (drift). 

3. A differential value of MPEG-2 QP determines dquant in MPEG-4. Dquant is set to ±2 whenever the differential 
value is greater than ±2. dquant is a 2-bit code which specifies a change in the quantizer, quant, for I- and P-VOPs. 

4. The quantization matrix should be changed following the change of matrix in the MPEG-2 bitstream. 

15 5. The transcoder has the flexibility of enabling an alternate vertical scanning method (for interlaced sequence) at 
the VOL level. 

6. Intra AC/DC prediction (which involves scaling when the QP of the current block is not the same as that of the 
predicted block) should be turned off at a macroblock level to reduce complexity and mismatch in AC quantization. 

7. Higher efficiency can be obtained with the use of intra_dc_vlc_thr to select the proper VLC table (AC/DC) for 
20 coding of intra DC coefficients, e.g., as a function of the quantization parameter (except when intra_dc_vlcjhr is 

either 0 or 7 - these thresholds will force the use of the intra DC or AC table regardless of the QP). 

8. A skipped macroblock is coded as not_coded macroblock (all DCT coefficients are zero). 

9. Cbpy and cbpc (CBP) are set according to code_block_pattern_420 (CBP_420). Note that there is a slight 
discrepancy between CBP in MPEG-4 and CBP_420 in MPEG-2 for an intra macroblock. Specifically, when 

25 CBP_420 is set, it indicates that at least one of the DCT coefficients in that block is not zero. CBP contains similar 

information except it does not corresponds to *a DC coefficient in an intra macroblock (also depending on 
intra_dc_vlc_thr). Hence, it is possible that CBP is not zero when CBP_420 is zero in an intra macroblock (this 
case can happen in an l-VOP and P-VOP, but not B-VOP). 

30 [0075] There are three sources of loss in texture coding, namely QP coding, DC prediction and nonlinear scaler for 
DC quantization. MPEG-4 uses differential coding to code a QP MPEG-2 allows all possible 32 QP values at the 
expense of 5 bits. However, the differential value can take up to ±2 (in QP value units) and, hence, a differential value 
greater than ±2 is loss. This loss can be minimized by limiting the QP fluctuation among the macroblock in the MPEG- 
2 rate control algorithm. All intra macroblocks perform adaptive DC prediction, which may take a different prediction 

35 from the previous macroblock (MPEG-2 DC prediction) thereby causing a different DC residual for the quantization. 
DC coefficients of all intra macroblocks in MPEG-4 are also quantised in a different manner from MPEG-2 because of 
the nonlinear scaler. Therefore, quantised DC coefficients for MPEG-2 and MPEG-4 coding are likely to be different 
for an intra macroblock. 

40 4.3 MV coding 

[0076] The transcoder encodes MVs into an MPEG-4 format. However, there is no error involved in transcoding a 
MV from MPEG-2 to MPEG-4 since MV coding is a lossless process. The following constraints are imposed on a 
MPEG-4 encoder: 

45 

1 . Unrestricted motion compensation mode is disabled, which means no MV pointing outside the boundary of the 
frame. 

2. Advanced prediction mode is employed. A different predictor (a median value) is used in an MPEG-4 bitstream, 
but a MV for 8x8 pels block is not._That is, advanced prediction mode allows 8x8 MV and nonlinear (median filter) 

50 predictor. Only a nonlinear predictor is used in our format transcoder (we still keep a 1 6x16 MV). 

3. Direct mode is not allowed in an MPEG-4 bitstream, which means there are only four MV types for a B-VOP, i. 
e., 16x16 forward and backward vectors and 16X8 forward and backward field vectors. 

4. Field motion compensation is applied whenever a 16x8 field vector is used (maintain mode). 

5. A skipped macroblock is coded as not_coded macroblock (motion compensation with zero MV). 

55 6. Single f_code is allowed in MPEG-4. Therefore, the largerf_code in MPEG-2 between the two directions (vertical, 

horizontal) is converted to f_code in MPEG-4 based on the following relationship: f_code(MPEG-4) = f_code 
(MPEG-2)-1 . 

7. A padding process is not used since the texture for the entire reference frame is known. 
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8. Field motion compensation is used whenever dual prime arithmetic is activated. Vector parities (field of the 
reference and field of the predicting frame) are preserved. Field MVs are generated according to vector[0][0][1 :0] 
which is coded in the MPEG-2 bitstream. When prediction of the same parity is used (e.g., top field to top field, or 
bottom field to bottom field), both field MVs are vector[0][0][1 :0]. When prediction of the odd parity is used (e.g., 
s top field to bottom field, or bottom field to top field), the top field MV uses vector[2][0][1 :0] and the bottom field MV 

uses vectoit3][0][1:0]. Vectortr][0][0:1] for r=2,3 can computed as follows: 

(a) Vector[r][0][0] = (vector[0][0][0] x m[parity_ref][parity_pred]//2) + dmvector{0]. 

(b) Vector[r][0][1] = (vector[0][0][1 ] x m[parity_ref][parity_pred]//2) + e[parity_ref][parity_pred] + dmvector[1]. 

10 

[0077] Note that (m[parity_ref][parity_pred] and e[parity_ref][parity_pred] are defined in Table 7-11 and 7-12, respec- 
tively in the MPEG-2 specification (ISO/IEC 13818-2). 

[0078] Moreover, "r" denotes the order of the MV, e.g., first, second, etc. r=0 denotes to the first set of MV, and r=1 
denotes the second set of MV. Dual prime prediction uses r=2 and r=3 to identify two extra sets of MVs. 
15 [0079] 7/" denotes integer division with rounding to the nearest integer. 

4.4 Coding of intra MB in B-VOP 

[0080] Additional conversion is necessary when coding an intra MB in a B-frame of a MPEG-2 bitstream (e.g., as 
20 shown in FIG. 4). MPEG-4 replaces intra mode with direct mode for B-VOP and hence an intra MB in B-frame has to 
be coded differently in the MPEG-4 syntax. There are two practical solutions to this problem. 
[0081] The first solution employs the architecture similar to the front-to-back transcoder of FIG. 3 (no buffer for the 
entire reference frame). MC is performed against the previous MB (or previous MB without compensating texture 
residual with the expense of the extra memory with the size of one MB) in the same VOP under the assumption that 
25 this MB is close enough to its reference MB (its uncompensated version). The MV for the intra MB equals the MV of 
the previous MB offset by its MB distance. 

[0082] The second solution uses the architecture similar to the one shown in FIG. 4. It keeps the reference frame 
for all I and P-VOPs. Note that MC has to be performed on all P-VOPs in this solution. The MV for the intra MB is the 
same as the predicted MV (median of its three neighbors) and MC is performed against the reference MB pointed by 
30 the derived MV. 

5. Video downscaling in the compressed domain 

[0083] Generally, video downscaling and size transcoding have the same meaning. Downsampling means sub-sam- 
35 pling with an anti-aliasing (low pass) filter, but subsampling and downsampling are used interchangeably herein. 

[0084] Size transcoding becomes computationally intensive when its input and output are in the compressed domain. 

A video downscaling process which limits its operations in the compressed domain (and, in effect, avoids decoding 

and encoding processes) provides a much reduced complexity. However, two new problem arises with downscaling 

in the compressed domain, i.e., downsampling of DCT coefficients and MV data. 
40 [0085] Recently, video downscaling algorithms in the compressed domain have been discussed, but they do not 

address the complete transcoding between MPEG-2 and MPEG-4, which includes field-to-frame deinterlacing. The 

present invention addresses this problem. 

[0086] Subsection 5.1 and 5.2 provide solutions to two new problems in the downsampling process. The implemen- 
tation of a proposed size transcoder in accordance with the invention is described in section 6 and Figures 5 and 6. 

45 

5.1 Subsampling of DCT block 

[0087] In frame-based video downscaling, it is necessary to merge four 8x8 DCT blocks into a new 8x8 DCT block 
(specific details involving a field block will be described later). Moreover, the output block should be a low pass version 

so of the input blocks. This process is carried out in the spatial domain by multiplying the input matrix with a subsampling 
matrix (preferably with a low pass filter). Multiplication by a subsampling matrix in the spatial domain is equivalent to 
multiplication by DCT coefficients of a subsampling matrix in the DCT domain because of the distributive property of 
the orthogonal transform. However, the number of operations (computations) in the downsampling process in the DCT 
domain for some downsampling filters can be as high as the total number of operations of its counterpart in the spatial 

55 domain. The solution to this problem is to employ a downsampling matrix which is sparse (e.g., a matrix that has 
relatively few non-zero values, e.g., approximately 30% or less). 

[0088] A sparse downsampling matrix may be based on the orthogonal property between the DCT basis vector and 
the symmetry structure of the DCT basis vector. One approach, discussed in R. Dugad and N. Ahuja, "A Fast Scheme 
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For Downsampling And Upsampling In The DCT Domain," International Conference on Image Processing (ICIP) 99, 
incorporated herein by reference, takes the lower 4x4 DCT coefficients from four processing blocks, applies a 4x4 
IDCT to each DCT subblock, forms a new 8x8 pixel block and applies an 8x8 DCT to obtain an output block. The 
downsampling matrix can be pre-calculated since the downsampling process is fixed. By splitting the 8x8 DCT matrix 
into left and right halves, about half of the downsampling matrix values are zero because of the orthogonality between 
the column of the 4x4 IDCT matrix and the row of both left and right 8x4 DCT matrices. This operation (one dimension) 
can be written mathematically as: 



10 



15 



20 



25 



B = Tb = T 



-[T L i T R ] 



• • • 



= T L f 4 B l+ T R TiB2 



where b is a 8x1 spatial input vector, B is its corresponding 8x1 DCT vector, b 1 and b 2 are subsampled 4x1 vectors, 
B 1 and B 2 are lower 4x1 DCT vectors, 7 is the 8x8 DCT transform matrix, T 4 is the 4x4 DCT transform matrix, T L and 
T R are left and right half of T. The superscript T denotes a matrix transpose. Dugad's algorithm also employs the 
symmetry property of the DCT 
identical in terms of magnitude 
even rows of Tare symmetric, 
can be calculated based on the same components, i.e., a symmetrical part, E, (index which H-j is even) and an antF- 
symmetrical part, O, (index which H-j is odd) (T L T^ = E+O and T R T^ = E-O). This arrangement effectively reduces the 
number of multiplications by a factor of two when the downsampling process is done as: 

B = T L T* 4 Bi + T H f 4 B 2 = (E + 0)0, + (E- 0)B 2 = E(S 1 + B 2 ) + 0(B, - B 2 ) 




30 



[0089] Implementation of Dugad's method to convert four field blocks into one frame block is not as simple. An 
extension of the downsampling process in this scenario (one dimension) can be written as: 



B=T(S T f 4 B T +S B T* 4 B B ) 



35 where B r and B fl are the lower 4x1 field vectors, S r and S B are DCT values of an 8x4 deinterlacing matrix corresponding 
to its top, S T , and bottom, S Bi field block, respectively. Elements of S T , Sj{/,/)=1 if fj=2i t 0<i<3) and Sj{i,j)=Q otherwise. 
Elements of S 0> Sg(/j)= 1 if (^2/VI , 0</<3) and S^iJ) = 0 otherwise. 

[0090] This is a modification of Dugad's algorithm for downsampling and deinterlacing in accordance with the present 
invention. 

40 [0091] The operations of downscaling and the deinterlacing process are more complex since S and Tare not or- 
thogonal to each other and, hence, the downsampling matrix is not sparse. C. Yim and MA Isnardi, "An Efficient 
Method For DCT-Domain Image Resizing With Mixed Field/Frame-Mode Macroblocks," IEEE Trans. Circ. and Syst. 
For Video Techno!., vol. 9. pp. 696-700, Aug. 1999, incorporated herein by reference, propose an efficient method for 
downsampling a field block. A low pass filter is integrated into the deinterlacing matrix in such a way that the down- 

45 sampling matrix (S^0.5[/ 8 / 8 ]) is sparse. 

[0092] l 8 denotes an 8x8 identity matrix, and [l 8 l 8 ] denotes a 1 6x8 matrix that comprises a concatenation of the two 
identity matrixes. The identity matrix, of course, has all ones on the diagonal and all zeroes elsewhere. 
[0093] The method starts with four 8x8 IDCT field blocks, then applies the downsampling matrix, S, and performs 
an 8x8 DCT to obtain the output block. Note that an 8x8 IDCT is used in this method instead of a 4x4 IDCT. This 

so operation can be shown mathematically (in one dimension) as: 



55 
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5.2 Subsampling of MV data 

[0094] ME is the bottleneck of the entire video encoding process. It is hence desirable to estimate a MV of the resized 
MB by using MVs of four original MBs without actually performing ME (assuming that all MBs are coded in inter mode). 

5 Note that, if an MPEG-2 bitstream is assumed, subsampling of MV data takes MVs of four MBs since each MB has 
one input (only an MPEG-4 bitstream can have a MV for every block). The simplest solution is to average four MVs 
together to obtain the new MV but it gives a poor estimate when those four MVs are different. B. Shen, I.K. Sethi and 
B. Vasudev, "Adaptive Motion-Vector Resampling For Compressed Video Downscaling," IEEE Trans. Circ. and Syst. 
For Video Technol., vol. 9, pp. 929-936, Sep. 1999, show that a better result can be obtained by giving more weight 

io to the worst predicted MV A matching accuracy, A, of each MV is indicated by the number of nonzero AC coefficients 
in that MB. By using the Shen et al. technique, the new MV for the downscaled MB can be computed as: 

20 

[0095] M.R. Hashemi, L. Winger and S. Panchanathan, "Compressed Domain Motion Vector Resampling For Down- 
scaling Of MPEG Video," ICIP 99, propose a nonlinear method to estimate the MV of the resized MB. Similar to the 
algorithm in Shen et al., Hashemi's technique uses spatial activity of the processing MBs to estimate the new MV. A 
heuristic measurement, called Maximum Average Correlation (MAC) is employed in Hashemi's method to identify one 
25 of the four original MVs to be the output M V. By using the MAC, the new MV for the downscaled MB can be computed as: 



30 



i=l 



where p is the spatial correlation and is set to 0.85, and dj is the Euclidean distance between the ith input MV (MV]) 
and the output MV. 

35 

6. Implementation of the size transcoder 

[0096] FIG. 5 illustrates a size transcoder in accordance with the invention. B frames may be present in the input 
bitstream, but are discarded by the transcoder and therefore do not appear in the output bitstream. 

40 [0097] In the transcoder 500, a MV scaling function 510, DCT scaling function 520, and spatial scaling function 540 
are added. Switches 530 and 535 are coordinated so that, in a first setting, an output of the DCT function 455 is routed 
into the quantisation function 340, and the switch 535 is closed to enable an output of the spatial scaling function 540 
to be input to the adder 445. In a second setting of the switches 530 and 535, an output of the DCT scaling function 
520 is routed into the quantisation function 340, and the switch 535 is open. 

45 [0098] The transcoder 500 converts an MPEG-2 bitstream into an MPEG-4 bitstream which corresponds to a smaller 
size video, e.g., from ITU-R 601 (720x480) to SIF (352x240). 

[0099] To achieve a bandwidth requirement for the MPEG-4 bitstream, the transcoder 500 subsamples the video by 
two in both the horizontal and vertical directions (at the spatial scaling function 540) and skips all B-f rames (at temporal 
scaling functions 545 and 546), thereby reducing the temporal resolution accordingly. Note that the temporal scaling 
so function 546 could alternatively be provided after the DCT scaling function 520, Skipping of B-f rames before performing 
downscaling reduces complexity. 

[0100] Moreover, a low pass filter (which can be provided in the spatial scaling function 540) prior to subsampling 
should result in improves image quality. 

[0101] The invention can be extended to include other downsampling factors, and B-VOPs, with minor modifications. 
55 Specifically, changes in MV downscaling and mode decision are made. MV downscaling for B-VOP is a direct extension 
of what was discussed to include the backward MV The mode decision for B-VOP can be handled in a similar way as 
in the P-VOP (e.g., by converting uni-directional MV into bi-directional MV as in converting intra MB into inter MB in a 
P-VOP). 
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[01 02] Below, we discuss six problems that are addressed by the size transcoder 500. We also assume that the input 
video is 704x480 pixel resolution, and coded with an MP@ML MPEG-2 encoder, and the desired output is simple 
profile MPEG-4 bitstream which contains SIF progressive video (with a frame rate reduction by N). However, the in- 
vention can be extended to other input and output formats and resolutions as well. 

5 

6.1 Progressive Video MV downscaling (luma) 

[01 03] This problem appears when all four MBs are coded as inter, and use frame prediction. Each MV in those MBs 
is downscaled by two in each direction (horizontal and vertical) to determine the MV of four blocks in MPEG-4 (M PEG- 
10 4 allows one MV per 8x8 block). The scaled MVs are then predictively encoded (using a median filter) using the normal 
MPEG-4 procedure. 

[0104] Note that each MB (comprising four blocks) has to be coded in the same mode in both MPEG-2 and MPEG- 
4. With video downscaling, the output MB (four blocks) corresponds to four input MBs. 

15 6.2 Interlaced Video MV downsampling (luma) 

[0105] This problem exists when all four MBs are coded as inter and use field prediction. We need to combine two 
field MVs in each MB to get a frame M V of the resized block. Instead of setting the new M V based on the spatial activity, 
the proposed transcoder picks the new MV based on its neighbors' MVs. The MVs of all eight surrounding MBs are 
20 used to find a predictor (field MVs are averaged in case of MB with field prediction). The median value from these eight 
MVs becomes a predictor, and the field MV of the current MB, which is closer in terms of Euclidean distance, is scaled 
by two in the horizontal direction to become the new MV. 

6.3 MV downsampling (chroma) 

25 

[0106] This problem happens when all four MBs are coded as inter, and use eitherframe or field prediction (MPEG- 
4 treats both prediction mode in the same way for a chroma block). The process follows the MPEG-4 method to obtain 
a chroma MV from a luma MV, i.e., a chroma MV is the downscaled version of the average of its four corresponding, 
8x8 luma MVs. 

30 

6.4 DCT downsampling (luma progressive, chroma) 

[0107] This problem occurs when all four luma MBs are coded as intra or inter, and use frame MB structure, and 
their eight chroma blocks (four for Cr and four for Cb) use either frame or field structure). Dugad's method is used to 
35 downscale the luma and chroma DCT blocks by a factor of two in each direction. 

6.5 Interlaced DCT downsampling (luma) 

[0108] This problem arrives in one of two ways. First, its associated MB uses field prediction and second, its asso- 
40 ciated MB uses frame prediction. In either case, we want to downscale four 8x8 field DCT blocks (two for the top field, 
and two for the bottom field) into one 8x8 frame DCT block. The solution for the first case is to use the same field DCT 
block as the one chosen for MC. The second case involves deinterlacing and we propose a combination of the Dugad 
and Yim methods, discussed above. 

[0109] Specifically, the transcoder first downscales four field blocks in the vertical direction (and at the same time 
45 performs deinterlacing) based on the Yim algorithm to obtain two frame blocks. The transcoder then downscales these 
two frame blocks in the horizontal direction to get the output block using the Dugad algorithm. 
[0110] This is illustrated in FIG. 6, where four 8x8 coefficient field-mode DCT blocks are shown at 600, two 8x8 
frame-mode DCT blocks are shown at 61 0, and one 8x8 frame-mode DCT block is shown at 620. 
[0111] The procedure for DCT downscaling in accordance with the invention can be summarized as follows: 

50 

1 . Form the 16x16 coefficient input matrix by combining four field blocks together as shown at 600. 

2. For vertical downscaling and filtering, apply a low pass (LP) filter D according to Yim's algorithm to every row 
of the input matrix. The LP input matrix is now 16x8 pixels, as shown at 61 0. 

3. Form B 1 and B 2 QxB matrices from the LP matrix ([S^ fly), 

55 4. Perform a horizontal downscaling operation according to Dugad's algorithm to every column of B 1 and B 2 Xo 
obtain the output matrix (8x8) (620) as follows: 
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B= 0,(7^)'+ B 2 (T R fJ= (fl, + S 2 )£+ (B, - S 2 )0 

where £ and O denote even and odd rows as discussed above. 
[0112] In particular, a horizontal downsampling matrix composed of odd "0" and even "E" matrices as follows may 
be used (ignoring the scaling factor): 

E = [e(0) 0 0 0, 



0 


e(l)0 e(2), 


0 


0 0 0, 


0 


e(3)0 e(4), 


0 


0 e(5) 0, 


0 


e(6)0 e(7), 


0 


0 0 0, 


0 


e(8) 0 e(9)]. 



O = [0 0 0 0, 
o(0) 0 0(1) 0, 
0 0(2) 0 0, 
0(3) 0 0(4) 0, 
0 0 0 0, 
0(5) 0 0(6) 0, 
0 0 0 0(7), 
0(8) 0 0(9) 0]. 

[01 13] The coefficients as follows can be used: 



e(0) 


= 4 


o(0) 


= 2.56915448 


o(1) 


= 0.831469612 


0(1) 


=-0.149315668 


e(2) 


= 0.045774654 


0(2) 


= 2 


e(3) 


= 1.582130167 


0(3) 


=-0.899976223 


e(4) 


=-0.195090322 


0(4) 


= 1 .026559934 


e(5) 


= 2 


0(5) 


= 0.601344887 


e(6) 


=-0.704885901 


0(6) 


= 1.536355513 


e(7) 


= 0.980785280 


0(7) 


= 2 


e(8) 


= 0.906127446 


o(8) 


=-0.509795579 


e(9) 


= 1.731445835 


0(9) 


=-0.750660555. 



[0114] Essentially, the product of a DCT matrix which is sparse is used as the downsampling matrix. 

[0115] The technique may be extended generally for 2:1 downsizing of an NxN block that comprises four N/2xN/2 

coefficient field-mode blocks. Other downsizing ratios may also be accommodated. 
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6.6 Special cases 

[01 16] Special cases occur when all four MBs are not coded In the same mode (not falling in any of the five previous 
cases). We always assume that any intra orskipped MB among the other inter MBs are inter mode with zero MV. Field 
5 MVs are merged based on section 6.2 to obtain frame MV, and then we apply the techniques of section 6.1 . MC is 
recommended to determine the texture of the intra block, which is treated as an inter block with a zero MV by the 
transcoder. 

7. Conclusion 

10 

[0117] It should now be appreciated that the present invention provides a transcoder architecture that provides the 
lowest possible complexity with a small error. This error is generated in the MPEG-4 texture encoding process (QP 
coding, DC prediction, nonlinear DC scaler). These processes should be removed in the future profile of MPEG-4 to 
create a near-lossless transcoding system. 
15 [0118] The invention also provides complete details of a size transcoder to convert a bitstream of ITU-R601 interlaced 
video coding with MPEG-2 MP@ ML into a simple profile MPEG-4 bitstream which contains SIF progressive video 
suitable for a streaming video application. 

[0119] For spatial downscaling of field-mode DCT blocks, it is proposed to combine vertical and horizontal downs- 
caling techniques in a novel manner such that sparse downsampling matrixes are used in both the vertical and hori- 
20 zontal direction, thereby reducing computations of the transcoder. 

[0120] Moreover, for MV downscaling, we propose using a median value from its eight neighboring MV. This proposal 
works better than algorithms In section 5.2 since our predicted MV go with the global MV. It also works well with an 
interlaced MB, which has only two MVs instead of 4 MVs per MB. 

[0121] Although the invention has been described in connection with various preferred embodiments, it should be 
25 appreciated that various modifications and adaptations may be made thereto without departing from the scope of the 
invention as set forth in the claims. 

[0122] In accordance with an embodiment of the invention, a method for transcoding a p re-compressed input bit- 
stream that is provided in a first video coding format, comprises the steps of: recovering header information of the input 
bitstream; providing corresponding header information in a second, different video coding format; partially decompress- 
so ing the input bitstream to provide partially decompressed data; and re-compressing the partially decompressed data 
in accordance with the header information in the second format to provide an output bitstream. 
[0123] The first and second video coding formats can comprise an MPEG-2 format and an MPEG-4 format, respec- 
tively. 

[0124] For example, the first video coding format comprises MPEG-2 Main Profile at Main Level; and the second 
35 video coding form at com prises a simple profile MPEG-4 bitstream with Standard Intermediate Format (SIF) progressive 
video. 

[0125] The partially decompressed data can comprise motion vectors and Discrete Cosine Transform (DCT) coeffi- 
cients; and the second format can comprise at least one of a new mode decision, AC/DC prediction, and motion com- 
pensation. 

40 [0126] At least one look-up table may be used to provide the corresponding header information in the second video 
coding format. 

[0127] Downscaling can be performed on the partially decompressed data by downsampling DCT coefficients and 
motion vector data thereof. 

[0128] Also, 2:1 downscaling can be performed on at least one group of four field-mode Discrete Cosine Transform 
45 (DCT) blocks of the partially decompressed data by performing vertical downsampling and de-interlacing thereto to 
obtain a corresponding group of two frame-mode DCT blocks, and performing horizontal downsampling to the two 
frame-mode DCT blocks to obtain one frame-mode DCT block. 

[0129] The vertical downsampling also achieves low pass filtering of the four field-mode DCT blocks. 
[0130] The vertical and horizontal downsampling might use respective sparse matrixes. 
so [0131] In the recompressing step, a code (DQUANT) which specifies a change in a quantizer could be set according 
to a differential value of a quantization parameter of the partially decompressed data. 

[0132] For re-compressing intra coded macroblocks, a coded block pattern (CBP) can be set according to a corre- 
sponding value of the partially decompressed data. 

[0133] For re-compressing non-intra coded macroblocks, skipped macroblocks in the partially decompressed data 
55 are preferably coded as not_coded macroblocks, where all Discrete Cosine Transform (DCT) coefficients have a zero 
value. 

[0134] In the recompressing step, predicted motion vectors in the partially decompressed data could be reset ac- 
cording to the second format. 
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[0135] Also, in the recompressing step, dual prime mode macroblocks of the partially decompressed data might be 
converted into field-coded macroblocks. 

[0136] An apparatus for transcoding a pre-compressed input bitstream that is provided in a first video coding format, 
may comprise: means for recovering header information of the input bitstream; means for providing corresponding 
header information in a second, different video coding format; means for partially decompressing the input bitstream 
to provide partially decompressed data; and means for re-compressing the partially decompressed data in accordance 
with the header information in the second format to provide an output bitstream. 



10 Claims 

1. A method for performing 2:1 downscaling on video data, comprising the steps of: 

forming at least one input matrix of NxN Discrete Cosine Transform (DCT) coefficients from the video data by 
15 combining four N/2xN/2 field-mode DCT blocks; 

performing vertical downsampling and de-interlacing to the input matrix to obtain two N/2xN/2 frame-mode 
DCT blocks; 

forming an NxN/2 input matrix from the two frame-mode DCT blocks; and 

performing horizontal downsampling to the NxN/2 matrix to obtain one N/2xN/2 frame-mode DCT block. 

20 

2. The method of claim 1 , wherein ISM 6. 

3. The method of claim 1 or 2, wherein: 

25 the vertical downsampling also achieves low pass filtering of the NxN input matrix. 

4. The method of one of claims 1 to 3, wherein: 

the vertical downsampling uses a sparse downsampling matrix. 

30 

5. The method of claim 4, wherein: 

the sparse downsampling matrix=0.5[l 8 y, where l 8 is an 8x8 identity matrix. 
35 6. The method of one of claims 1 to 5, wherein: 

the horizontal downsampling uses a sparse downsampling matrix composed of odd "O" and even "E" matrices. 
7. The method of claim 6, wherein: 

the even matrix has the following form: 



[e(0) 


0 


0 


o, 


0 


e(l) 


0 


e(2), 


0 


0 


0 


0, 


0 


e(3) 


0 


e(4), 


0 


0 


e(5) 


0, 


0 


e(6) 


0 


e(7), 


0 


0 


0 


0, 


• 

0 


e(8) 


0 


e(9)] 



where e(1) through e(9) are non-zero coefficients; and 
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the odd matrix has the following form: 



ro 


0 


o 


0, 


o(0) 


0 


0(1) 


o, 

f 


0 


0(2) 


0 


o, 




n 
U 




n 
0/ 


0 


0 


0 


0, 


0(5) 


0 


0(6) 


0, 


0 


0 


0 


0(7), 


0(8) 


0 


o(9) 


0] 



where o(1) through o(9) are non-zero coefficients. 

20 8. An apparatus for performing 2:1 downscaling on video data, comprising: 

means for forming at least one input matrix of NxN Discrete Cosine Transform (DCT) coefficients from the 
video data by combining four N/2xN/2 field-mode DCT blocks; 

means for performing vertical downsampling and de-interlacing to the input matrix to obtain two N/2xN/2 frame- 
25 mode DCT blocks; 

means for forming an NxN/2 input matrix from the two frame-mode DCT blocks; and 

means for performing horizontal downsampling to the NxN/2 matrix to obtain one N/2xN/2 frame-mode DCT 

block. 

30 9. The apparatus of claim 8, wherein N= 1 6. 

10. The apparatus of claim 8 or 9, wherein: 

the means for performing vertical downsampling also achieves low pass filtering of the NxN input matrix. 

35 

11. The apparatus of one of claims 8 to 10, wherein: 

the means for performing vertical downsampling uses a sparse downsampling matrix. 
40 12. The apparatus of claim 11 , wherein: 

the sparse downsampling matrix=0.5[l 8 Ig], where l 8 is an 8x8 identity matrix. 
13. The apparatus of one of claims 8 to 12, wherein: 



the means for performing horizontal downsampling uses a sparse downsampling matrix composed of odd "O" 
and even "E" matrices. 

14. The apparatus of claim 13, wherein: 

the even matrix has the following form: 
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E = [e(0) 


0 


0 






0 


e(l) 


0 


e(2), 


5 


0 


0 


0 


0, 




0 


e(3) 


0 


e(4), 




0 


0 


e(5) 


0, 


10 


0 


e(6) 


0 


e(7), 




0 


0 


0 


0, 




0 


e(8) 


0 


e(9)] 



15 

where e(1) through e(9) are non-zero coefficients; and 
the odd matrix has the following form: 



20 


O = [0 


0 


0 


0, 




o(0) 


0 


0(1) 


0, 




0 


0(2) 


0 


0, 


25 


0(3) 


0 


0(4) 


0, 




0 


0 


0 


0, 




0(5) 


0 


0(6) 


0, 


30 


0 


0 


0 


0(7), 




0(8) 


0 


0(9) 


0] 



where o(1) through o(9) are non-zero coefficients. 
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